# Self-supervised learning
Vjepa2 Vitl Fpc64 256
MIT
V-JEPA 2 is a cutting-edge video understanding model developed by the FAIR team under Meta. It extends the pre-training objectives of VJEPA and has industry-leading video understanding capabilities.
Video Processing
Transformers

V
facebook
109
27
Midnight
MIT
Midnight-12k is a foundational pathology model trained with self-supervised learning on a small dataset, achieving performance comparable to leading models
Image Classification
Safetensors English
M
kaiko-ai
516
4
Izanami Wav2vec2 Large
Other
Japanese wav2vec2.0 Large model pre-trained on large-scale Japanese TV broadcast audio data
Speech Recognition Japanese
I
imprt
89
1
Kushinada Hubert Base
Apache-2.0
Japanese speech feature extraction model pre-trained on 62,215 hours of Japanese TV broadcast audio data
Speech Recognition Japanese
K
imprt
1,922
1
Rnafm
RNA foundation model pre-trained on non-coding RNA data using masked language modeling (MLM) objective
Protein Model
Safetensors Other
R
multimolecule
6,791
1
Voc2vec As Pt
Apache-2.0
voc2vec is a foundational model specifically designed for non-linguistic human data, built upon the wav2vec 2.0 framework.
Audio Classification
Transformers English

V
alkiskoudounas
31
0
Videomaev2 Base
VideoMAEv2-Base is a self-supervised video feature extraction model that employs a dual masking mechanism pre-trained on the UnlabeldHybrid-1M dataset.
Video Processing
V
OpenGVLab
3,565
5
Rnabert
RNABERT is a pre-trained model based on non-coding RNA (ncRNA), employing Masked Language Modeling (MLM) and Structural Alignment Learning (SAL) objectives.
Molecular Model Other
R
multimolecule
8,166
4
Ijepa Vitg16 22k
I-JEPA is a self-supervised learning method that predicts representations of other parts of an image from partial representations, without relying on manual data transformations or filling in pixel-level details.
Image Classification
Transformers

I
facebook
14
3
Ijepa Vith16 1k
I-JEPA is a self-supervised learning method that predicts representations of other parts of an image from partial representations, without relying on predefined manual data transformations or pixel-level detail filling.
Image Classification
Transformers

I
facebook
153
0
Ijepa Vith14 22k
I-JEPA is a self-supervised learning method that predicts representations of other parts of an image from partial representations, without relying on predefined manual data transformations or pixel-level detail filling.
Image Classification
Transformers

I
facebook
48
0
Ijepa Vith14 1k
I-JEPA is a self-supervised learning method that predicts representations of other parts of an image from partial representations, without relying on manual data transformations or filling in pixel-level details.
Image Classification
Transformers

I
facebook
8,239
10
Dinov2.large.patch 14
Apache-2.0
DINOv2 large is a large-scale visual feature extraction model based on self-supervised learning, capable of generating robust image feature representations.
D
refiners
20
0
Rad Dino
Other
Vision Transformer model trained with self-supervised DINOv2, specifically designed for encoding chest X-ray images
Image Classification
Transformers

R
microsoft
411.96k
48
Ahma 7B
Apache-2.0
Ahma-7B is a 7-billion-parameter decoder-only Transformer model based on Meta Llama(v1) architecture, fully pretrained from scratch using Finnish language.
Large Language Model
Transformers Other

A
Finnish-NLP
201
8
Vit Small Patch8 224.lunit Dino
Other
An image classification model based on the Vision Transformer (ViT), trained on 33 million histological sections using the DINO self-supervised learning method, suitable for pathological image classification tasks.
Image Classification
V
1aurent
167
1
Phikon
Other
Phikon is a self-supervised learning model for histopathology based on iBOT training, primarily used for extracting features from histology image patches.
Image Classification
Transformers English

P
owkin
741.63k
30
Hubert Base Audioset
Audio representation model based on HuBERT architecture, pre-trained on the complete AudioSet dataset, suitable for general audio tasks
Audio Classification
Transformers

H
ALM
345
2
Pubchemdeberta Augmented
TwinBooster is a DeBERTa V3 base model fine-tuned on the PubChem bioassay corpus, combining Barlow Twins self-supervised learning method and gradient boosting techniques to enhance molecular property prediction.
Molecular Model
Transformers English

P
mschuh
25
0
Japanese Hubert Base
Apache-2.0
Japanese HuBERT base model trained by rinna Co., Ltd., based on approximately 19,000 hours of Japanese speech corpus ReazonSpeech v1.
Speech Recognition
Transformers Japanese

J
rinna
4,550
68
Data2vec Vision Base Ft1k
Apache-2.0
Data2Vec-Vision is a self-supervised learning model based on the BEiT architecture, fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification
Transformers

D
facebook
7,520
2
Data2vec Vision Large Ft1k
Apache-2.0
Data2Vec-Vision is a self-supervised learning vision model based on the BEiT architecture, fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification
Transformers

D
facebook
68
5
Data2vec Vision Large
Apache-2.0
Data2Vec-Vision is a self-supervised learning model based on the BEiT architecture, pre-trained on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification
Transformers

D
facebook
225
2
Data2vec Vision Base
Apache-2.0
Data2Vec-Vision is a self-supervised learning model based on the BEiT architecture, pretrained on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification
Transformers

D
facebook
427
3
Wav2vec2 Large 10min Lv60 Self
Apache-2.0
This model is a large-scale speech recognition model based on the Wav2Vec2 architecture, pre-trained and fine-tuned on 10 minutes of data from Libri-Light and Librispeech, using self-training objectives, suitable for 16kHz sampled speech audio.
Speech Recognition
Transformers English

W
Splend1dchan
177
0
Data2vec Audio Large 960h
Apache-2.0
Data2Vec is a general self-supervised learning framework applicable to speech, vision, and language tasks. This large audio model is pre-trained and fine-tuned on 960 hours of LibriSpeech data, specifically optimized for automatic speech recognition tasks.
Speech Recognition
Transformers English

D
facebook
2,531
7
Data2vec Audio Large
Apache-2.0
Data2Vec-Audio-Large is a large model pre-trained on 16kHz sampled speech audio using a self-supervised learning framework, suitable for tasks such as speech recognition.
Speech Recognition
Transformers English

D
facebook
97
1
Data2vec Text Base
MIT
A general self-supervised learning framework pre-trained on English language using the data2vec objective, handling different modality tasks through a unified approach
Large Language Model
Transformers English

D
facebook
1,796
12
Large
Apache-2.0
A Transformer model pre-trained on English corpus using ELECTRA-like objective functions, learning intrinsic representations of English language through self-supervised methods.
Large Language Model
Transformers English

L
funnel-transformer
190
2
Dino Vits16
Apache-2.0
A self-supervised Vision Transformer model trained using the DINO method, suitable for image feature extraction
Image Classification
Transformers

D
facebook
47.32k
16
Wav2vec2 Spanish
Pre-trained speech recognition model based on Common Voice Spanish data, trained on TPU using the Flax framework
Speech Recognition Spanish
W
flax-community
16
2
Albert Fa Base V2 Clf Digimag
Apache-2.0
The first lightweight ALBERT model for Persian language, trained based on Google ALBERT BASE 2.0 version
Large Language Model
Transformers Other

A
m3hrdadfi
14
0
Wav2vec2 Large Es Voxpopuli
Large-scale speech pre-training model trained on the Spanish subset of the VoxPopuli corpus, suitable for Spanish speech recognition tasks
Speech Recognition Spanish
W
facebook
117.04k
1
Xlarge
Apache-2.0
Funnel Transformer is an English text pre-training model based on self-supervised learning, adopting objectives similar to ELECTRA, achieving efficient language processing by filtering sequence redundancy.
Large Language Model
Transformers English

X
funnel-transformer
31
1
Tf Xlm Roberta Base
XLM-RoBERTa is an extended version of a cross-lingual sentence encoder, trained on 2.5T of data across 100 languages, achieving excellent performance in multiple cross-lingual benchmarks.
Large Language Model
Transformers

T
jplu
4,820
1
Hubert Large Ls960 Ft
Apache-2.0
HuBERT-Large is a self-supervised speech representation learning model fine-tuned on 960 hours of LibriSpeech data for automatic speech recognition tasks.
Speech Recognition
Transformers English

H
facebook
776.27k
66
Tf Xlm Roberta Large
XLM-RoBERTa is a large-scale cross-lingual sentence encoder, trained on 2.5TB of data across 100 languages, achieving excellent performance in multiple cross-lingual benchmarks.
Large Language Model
Transformers

T
jplu
236
1
Core Clinical Diagnosis Prediction
The CORe model is based on BioBERT and trained on medical data through clinical outcome pre-training objectives for predicting ICD9 diagnosis codes from admission records.
Text Classification
Transformers English

C
DATEXIS
789
32
Papugapt2
Polish text generation model based on GPT2 architecture, filling the gap in Polish NLP field, trained on multilingual Oscar corpus
Large Language Model Other
P
flax-community
804
11
S2t Wav2vec2 Large En De
MIT
Transformer-based end-to-end speech translation model, specifically designed for English-to-German speech translation
Speech Recognition
Transformers Supports Multiple Languages

S
facebook
817
4
- 1
- 2
Featured Recommended AI Models